{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 交叉编译与RPC \n",
    "参考：[tutorial-cross-compilation-and-rpc](https://tvm.apache.org/docs/how_to/tutorials/cross_compilation_and_rpc.html#tutorial-cross-compilation-and-rpc)\n",
    "\n",
    "本教程介绍了在 TVM 中使用 RPC 进行交叉编译和远程设备执行。\n",
    "\n",
    "通过交叉编译和 RPC，您可以 **在本地机器上编译程序，然后在远程设备上运行它**。当远程设备资源有限时，例如树莓派和移动平台，这一方法非常有用。在本教程中，将使用树莓派作为 CPU 示例，以及 Firefly-RK3399 作为 OpenCL 示例。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 在设备上构建 TVM 运行时\n",
    "\n",
    "首先，需要在远程设备上构建 TVM 运行时。\n",
    "```{note}\n",
    ":class: alert alert-info\n",
    "本节和下一节中的所有指令都应该在目标设备上执行，例如树莓派。假设目标是运行 Linux 系统。\n",
    "```\n",
    "\n",
    "由于在本地机器上进行编译，远程设备仅用于运行生成的代码。因此，仅需在远程设备上构建 TVM 运行时环境。\n",
    "\n",
    "```bash\n",
    "git clone --recursive https://github.com/apache/tvm tvm\n",
    "cd tvm\n",
    "make runtime -j2\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "成功构建运行环境后，需要在 {code}`~/.bashrc` 文件中设置环境变量。可以使用 {code}`vi ~/.bashrc` 命令编辑 {code}`~/.bashrc` 文件，并添加以下内容（假设您的 TVM 目录位于 {code}`~/tvm`）：\n",
    "```bash\n",
    "export PYTHONPATH=$PYTHONPATH:~/tvm/python\n",
    "```\n",
    "\n",
    "要更新环境变量，请执行：`source ~/.bashrc`。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 在设备上设置 RPC 服务器\n",
    "\n",
    "要在远程设备（本例中为 Raspberry Pi）上启动 RPC 服务器，请运行以下命令。\n",
    "\n",
    "```bash\n",
    "python -m tvm.exec.rpc_server --host 0.0.0.0 --port=9090\n",
    "```\n",
    "\n",
    "如果你看到下面的这行文字，意味着 RPC 服务器在你的设备上成功启动了。\n",
    "\n",
    "```bash\n",
    "INFO:root:RPCServer: bind to 0.0.0.0:9090\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 在本地机器上声明和交叉编译内核\n",
    "\n",
    "```{note}\n",
    ":class: alert alert-info\n",
    "\n",
    "现在回到本地机器，该机器已完全安装了 TVM（包括 LLVM）。\n",
    "```\n",
    "\n",
    "在这里，我们将在本地机器上声明简单的内核："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "\n",
    "import tvm\n",
    "from tvm import te\n",
    "from tvm import rpc\n",
    "from tvm.contrib import utils\n",
    "\n",
    "n = tvm.runtime.convert(1024)\n",
    "A = te.placeholder((n,), name=\"A\")\n",
    "B = te.compute((n,), lambda i: A[i] + 1.0, name=\"B\")\n",
    "s = te.create_schedule(B.op)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "接下来，进行内核的交叉编译。\n",
    "\n",
    "对于 Raspberry Pi 3B，目标应该是 `'llvm -mtriple=armv7l-linux-gnueabihf'`，但在这里使用 `'llvm'` 以便本教程能在网页构建服务器上运行。请查看以下部分的详细注释。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "local_demo = True\n",
    "\n",
    "if local_demo:\n",
    "    target = \"llvm\"\n",
    "else:\n",
    "    target = \"llvm -mtriple=armv7l-linux-gnueabihf\"\n",
    "\n",
    "func = tvm.build(s, [A, B], target=target, name=\"add_one\")\n",
    "# 将库保存到本地临时文件夹\n",
    "temp = utils.tempdir()\n",
    "path = temp.relpath(\"lib.tar\")\n",
    "func.export_library(path)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "````{note}\n",
    ":class: alert alert-info\n",
    "\n",
    "请注意，为了在真实远程设备上运行本教程，请将 `local_demo` 设置为`False`，并将 `build` 中的 `target` 替换为您设备的适当目标三元组。不同设备的目标三元组可能有所不同。例如，对于树莓派3B，它是 `llvm -mtriple=armv7l-linux-gnueabihf`；而对于 RK3399，则是 `llvm -mtriple=aarch64-linux-gnu`。\n",
    "\n",
    "通常，您可以在设备上运行 `gcc -v` 来查询目标，并查找以 `Target:` 开头的行（尽管这可能仍是大致的配置）。\n",
    "\n",
    "除了 `-mtriple`，您还可以设置其他编译选项，如：\n",
    "\n",
    "* `-mcpu=<cpuname>`\n",
    "  指定当前架构中特定的芯片以生成代码。默认情况下，这是从目标三元组推断出来的，并自动检测到当前架构。\n",
    "* `-mattr=a1,+a2,-a3,...`\n",
    "  覆盖或控制目标的特定属性，例如是否启用SIMD操作。默认的属性集由当前CPU设置。\n",
    "  要获取可用属性的列表，您可以执行：\n",
    "\n",
    "```bash\n",
    "llc -mtriple=<您的设备目标三元组> -mattr=help\n",
    "```\n",
    "\n",
    "这些选项与[llc](http://llvm.org/docs/CommandGuide/llc.html)保持一致。建议设置目标三元组和功能集以包含特定的可用功能，这样我们就可以充分利用板卡的功能。\n",
    "\n",
    "您可以从[LLVM跨平台编译指南](https://clang.llvm.org/docs/CrossCompilation.html)中找到有关交叉编译属性的更多详细信息。\n",
    "````"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 通过 RPC 远程运行CPU内核\n",
    "\n",
    "将展示如何在远程设备上运行生成的 CPU 内核。\n",
    "\n",
    "首先，从远程设备获取 RPC 会话。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "if local_demo:\n",
    "    remote = rpc.LocalSession()\n",
    "else:\n",
    "    # 以下是我的环境，请将其更改为您的目标设备 IP 地址。\n",
    "    host = \"10.77.1.162\"\n",
    "    port = 9090\n",
    "    remote = rpc.connect(host, port)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "将库上传到远程设备，然后调用设备本地的编译器重新链接它们。现在 `func` 是远程模块对象。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "2024-11-26 15:27:49.826 INFO load_module /tmp/tmplbfhxmbe/lib.tar\n"
     ]
    }
   ],
   "source": [
    "remote.upload(path)\n",
    "func = remote.load_module(\"lib.tar\")\n",
    "\n",
    "# create arrays on the remote device\n",
    "dev = remote.cpu()\n",
    "a = tvm.runtime.tensor(np.random.uniform(size=1024).astype(A.dtype), dev)\n",
    "b = tvm.runtime.tensor(np.zeros(1024, dtype=A.dtype), dev)\n",
    "# the function will run on the remote device\n",
    "func(a, b)\n",
    "np.testing.assert_equal(b.numpy(), a.numpy() + 1)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "当你想评估远程设备上内核的性能时，重要的是要避免网络开销。`time_evaluator` 将返回远程函数，该函数在远程设备上多次运行该函数，测量每次运行的成本，并返回测量到的成本。网络开销被排除在外。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "3.097e-07 secs/op\n"
     ]
    }
   ],
   "source": [
    "time_f = func.time_evaluator(func.entry_name, dev, number=10)\n",
    "cost = time_f(a, b).mean\n",
    "print(f\"{cost:g} secs/op\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 通过 RPC 远程运行 OpenCL 内核\n",
    "\n",
    "对于远程 OpenCL 设备，工作流程几乎与上述相同。\n",
    "\n",
    "您可以定义内核，上传文件，并通过 RPC 运行。\n",
    "\n",
    "```{note}\n",
    ":class: alert alert-info\n",
    "树莓派不支持 OpenCL，以下代码已在 Firefly-RK3399 上测试。您可以参考此 [教程](https://gist.github.com/mli/585aed2cec0b5178b1a510f9f236afa2)为RK3399设置操作系统和OpenCL驱动。\n",
    "```\n",
    "\n",
    "还需要构建在rk3399板上启用了OpenCL的运行时。在 TVM 根目录下执行以下命令：\n",
    "\n",
    "```bash\n",
    "cp cmake/config.cmake .\n",
    "sed -i \"s/USE_OPENCL OFF/USE_OPENCL ON/\" config.cmake\n",
    "make runtime -j4\n",
    "```\n",
    "\n",
    "以下函数展示了如何远程运行 OpenCL 内核："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false,
    "tags": [
     "hide-cell"
    ]
   },
   "outputs": [],
   "source": [
    "def run_opencl():\n",
    "    # NOTE: This is the setting for my rk3399 board. You need to modify\n",
    "    # them according to your environment.\n",
    "    opencl_device_host = \"10.77.1.145\"\n",
    "    opencl_device_port = 9090\n",
    "    target = tvm.target.Target(\"opencl\", host=\"llvm -mtriple=aarch64-linux-gnu\")\n",
    "\n",
    "    # create schedule for the above \"add one\" compute declaration\n",
    "    mod = tvm.IRModule.from_expr(te.create_prim_func([A, B]))\n",
    "    sch = tvm.tir.Schedule(mod)\n",
    "    (x,) = sch.get_loops(block=sch.get_block(\"B\"))\n",
    "    xo, xi = sch.split(x, [None, 32])\n",
    "    sch.bind(xo, \"blockIdx.x\")\n",
    "    sch.bind(xi, \"threadIdx.x\")\n",
    "    func = tvm.compile(sch.mod, target=target)\n",
    "\n",
    "    remote = rpc.connect(opencl_device_host, opencl_device_port)\n",
    "\n",
    "    # export and upload\n",
    "    path = temp.relpath(\"lib_cl.tar\")\n",
    "    func.export_library(path)\n",
    "    remote.upload(path)\n",
    "    func = remote.load_module(\"lib_cl.tar\")\n",
    "\n",
    "    # run\n",
    "    dev = remote.cl()\n",
    "    a = tvm.runtime.tensor(np.random.uniform(size=1024).astype(A.dtype), dev)\n",
    "    b = tvm.runtime.tensor(np.zeros(1024, dtype=A.dtype), dev)\n",
    "    func(a, b)\n",
    "    np.testing.assert_equal(b.numpy(), a.numpy() + 1)\n",
    "    print(\"OpenCL test passed!\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 小结\n",
    "\n",
    "本教程提供了TVM中跨平台编译和远程过程调用（RPC）功能的详细介绍。\n",
    "\n",
    "- 在远程设备上设置 RPC 服务器。\n",
    "- 配置目标设备，以便在本地机器上交叉编译内核。\n",
    "- 通过 RPC API 上传并远程运行内核。"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "ai",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.12.7"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
}
